Branch Classification to Control Instruction Fetch in Simultaneous Multithreaded Architectures

نویسندگان

  • P.M.W. Knijnenburg
  • A. Ramirez
  • F. Latorre
  • J. Larriba
  • M. Valero
چکیده

In Simultaneous Multithreaded architectures many separate threads are running concurrently, sharing processor resources, thereby realizing a high utilization rate of the available hardware. However, this also implies that threads are competing for resources and in many cases this competition can actually degrade overall performance. There are two major causes for this: first, instructions that, because of a long latency data cache miss, cause dependent instructions not to proceed for many cycles thereby wasting space in the instruction queues, and second, execution of instructions that belong to a mispredicted path. Both of these have a harmful effect on throughput and the second moreover wastes energy. In this paper we propose a fetch policy that avoids issuing instructions to the pipeline if we are not confident that the instruction belongs to the correct execution path. In this way, we avoid using resources for instructions that will not contribute to performance. This fetch policy, called agstall, is based on a dynamic branch classification mechanism. Branch instances are classified as either strongly biased or not strongly biased. We consider all strongly biased branches as easy to predict, and we stall the thread on not strongly biased branches to avoid mispredicting them. Our results show that agstall achieves similar or better performance than icount, and reduces by up to 86% the number of wrong-path instructions executed. This research was supported by the EC IST programme (contract HPRI-CT-1999-00071), by the Ministry of Science and Technology of Spain (contract TIC2001-0995), and by CEPBA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effect of Instruction Fetch and Memory Scheduling on GPU Performance

GPUs are massively multithreaded architectures designed to exploit data level parallelism in applications. Instruction fetch and memory system are two key components in the design of a GPU. In this paper we study the effect of fetch policy and memory system on the performance of a GPU kernel. We vary the fetch and memory scheduling policies and analyze the performance of GPU kernels. As part of...

متن کامل

Optimizations Enabled by a Decoupled Front-End Architecture

ÐIn the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the execution core. Attaining these targets is a challenging task due to I-cache misses, branch mispredictio...

متن کامل

An Effective Bypass Mechanism to Enhance Branch Predictor for SMT Processors

Unlike traditional superscalar processors, Simultaneous Multithreaded processor can explore both instruction level parallelism and thread level parallelism at the same time. With a same fetch width, SMT fetches instructions from a single thread not so deeply as in traditional superscalar processor. Meanwhile, all the instructions from different threads share the same Function Unites in SMT. All...

متن کامل

How to Enhance a Superscalar Processor to Provide Hard Real-Time Capable In-Order SMT

This paper describes how a superscalar in-order processor must be modified to support Simultaneous Multithreading (SMT) such that time-predictability is preserved for hard real-time applications. For superscalar in-order architectures the calculation of the Worst Case Execution Time (WCET) is much easier and tighter than for out-of-order architectures. By a careful enhancement that completely i...

متن کامل

Process Prefetching for a Simultaneous Multithreaded Architecture

Traditional superscalar architectures shall eventually prove incapable of taking full advantage of billions of transistors to be available in the future generations of microprocessors if they remain limited by dataflow dependencies. Thus, SMT (Simultaneous Multithreaded) architecture may be a possible solution to this problem, as far as it can fetch and execute a great deal of instruction flows...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001